19 research outputs found

    Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

    Get PDF
    Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers

    Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

    Get PDF
    The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation

    Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation

    Get PDF
    This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish Ministry of Education through the National Program FPU under Grant FPU15/0416

    Filterbank Learning for Small-Footprint Keyword Spotting Robust to Noise

    Full text link
    In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. Reducing the number of channels might yield certain KWS performance drop, but also a substantial energy consumption reduction, which is key when deploying common always-on KWS on low-resource devices. Experimental results on a noisy version of the Google Speech Commands Dataset show that filterbank learning adapts to noise characteristics to provide a higher degree of robustness to noise, especially when dropout is integrated. Thus, switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5% while simultaneously achieving a 6.3x energy consumption reduction

    Clinical Profile and Determinants of Mortality in Patients with Interstitial Lung Disease Admitted for COVID-19.

    Get PDF
    BACKGROUND Concern has risen about the effects of COVID-19 in interstitial lung disease (ILD) patients. The aim of our study was to determine clinical characteristics and prognostic factors of ILD patients admitted for COVID-19. METHODS Ancillary analysis of an international, multicenter COVID-19 registry (HOPE: Health Outcome Predictive Evaluation) was performed. The subgroup of ILD patients was selected and compared with the rest of the cohort. RESULTS A total of 114 patients with ILDs were evaluated. Mean ± SD age was 72.4 ± 13.6 years, and 65.8% were men. ILD patients were older, had more comorbidities, received more home oxygen therapy and more frequently had respiratory failure upon admission than non-ILD patients (all p < 0.05). In laboratory findings, ILD patients more frequently had elevated LDH, C-reactive protein, and D-dimer levels (all p < 0.05). A multivariate analysis showed that chronic kidney disease and respiratory insufficiency on admission were predictors of ventilatory support, and that older age, kidney disease and elevated LDH were predictors of death. CONCLUSIONS Our data show that ILD patients admitted for COVID-19 are older, have more comorbidities, more frequently require ventilatory support and have higher mortality than those without ILDs. Older age, kidney disease and LDH were independent predictors of mortality in this population.S

    Robust Speech Recognition on Intelligent Mobile Devices with Dual-Microphone

    Get PDF
    El objetivo es el desarrollo de una nueva serie de algoritmos de doble canal que aprovechen la información proporcionada por un micrófono secundario con el fin de mejorar la precisión de reconocimiento autormático del habla en dispositivos móviles inteligentes que son empleados en entornos ruidosos cotidianos. Generar nuevos recursos de voz bajo un marco de trabajo de dispositivo móvil de doble canal con propósitos experimentales. Evaluar nuestros desarrollos y compararlos con otras técnicas del estado del arte con el fin de extraer conclusiones que permitan continuar progresando.Tesis Univ. Granada. Programa oficial de doctorado en Tecnologías de la Información y la Comunicación

    A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

    No full text

    Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

    Full text link
    Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer vector and non-neutrally-phonated embedding spaces and operates in a principal component analysis domain to cope with non-neutrally-phonated speech data scarcity. Experiments are carried out using a cutting-edge speaker verification system integrating a powerful self-supervised pre-trained model for speech representation. In comparison with a state-of-the-art embedding compensation method, the proposed MMSE estimator yields superior and competitive equal error rate results when tackling shouted and whispered speech, respectively

    Análisis de propuestas de Educación Física en casa durante la suspensión de clases por la COVID-19 y orientaciones para su diseño en Educación Primaria

    No full text
    The education system attempts to effectively respond to the instructional changes and challenges caused by COVID-19. The adaptation process of predominantly face-to-face teaching to virtual one has involved a substantial effort for Physical Education (PE) teachers with the aim of adapting the teaching and learning process. To the best of our knowledge, no studies have analyzed the different at-home PE units done by teachers. This research aimed at examining, from a curricular perspective, distinct at-home PE proposals in order to ascertain their characteristics, share instances of good teaching practices, and provide teachers with useful guidelines to help them design quality proposals in the future. The results evidenced that the predominant activity profile was an individual motor exercise focused on physical fitness and introduced as a challenge, in which students repeat a specific movement sequence with the aid of the internet.O sistema educacional está tentando responder de forma eficaz aos desafios e mudanças na educação causados ​​pelo COVID-19. O processo de transição de um ensino predominantemente presencial para o virtual tem envolvido um considerável esforço dos professores de Educação Física (EF) na adaptação do processo ensino-aprendizagem. Não há evidências de estudos que analisaram iniciativas de EF em casa realizadas por professores. Esta pesquisa tem como objetivo analisar, numa perspetiva curricular, diferentes propostas de EF em casa, de forma a conhecer as suas características, partilhar exemplos de boas práticas e oferecer aos professores orientações úteis que os ajudem a conceber propostas de qualidade no futuro. Os resultados mostraram que o perfil de atividade predominante foi um exercício motor individual voltado para o desenvolvimento da condição física e apresentado como um desafio, onde os alunos repetem uma sequência específica de movimentos com o auxílio da Internet.El sistema educativo está tratando de responder de manera eficaz a los retos y cambios en materia educativa provocados por la COVID-19. El proceso de transición de una enseñanza predominantemente presencial a una virtual ha supuesto un esfuerzo considerable para el profesorado de Educación Física (EF) con la finalidad de adaptar el proceso de enseñanza-aprendizaje. No hay evidencias de estudios que hayan analizado las iniciativas de EF en casa realizadas por los docentes. Esta investigación tiene como objetivo analizar, desde una perspectiva curricular, diferentes propuestas de EF en casa con la finalidad de conocer sus características, compartir ejemplos de buenas prácticas y ofrecer al profesorado orientaciones útiles que les ayuden a diseñar propuestas de calidad en el futuro. Los resultados evidenciaron que el perfil predominante de actividad fue un ejercicio individual de carácter motriz centrado en el desarrollo de la condición física y presentada como un reto, donde el alumnado repite una secuencia específica de movimientos con ayuda de internet

    Dual-channel eKF-RTF framework for speech enhancement with DNN-based speech presence estimation

    Get PDF
    This paper presents a dual-channel speech enhance- ment framework that effectively integrates deep neural net- work (DNN) mask estimators. Our framework follows a beamforming-plus-postfiltering approach intended for noise reduction on dual-microphone smartphones. An extended Kalman filter is used for the estimation of the relative acous- tic channel between microphones, while the noise estimation is performed using a speech presence probability estimator. We propose the use of a DNN estimator to improve the prediction of the speech presence probabilities without making any assump- tion about the statistics of the signals. We evaluate and compare different dual-channel features to improve the accuracy of this estimator, including the power and phase difference between the speech signals at the two microphones. The proposed in- tegrated scheme is evaluated in different reverberant and noisy environments when the smartphone is used in both close- and far-talk positions. The experimental results show that our ap- proach achieves significant improvements in terms of speech quality, intelligibility, and distortion when compared to other approaches based only on statistical signal processing.Spanish Ministry of Science and Innovation Project No. PID2019-104206GB- I00/AEI/10.13039/501100011033Spanish Ministry of Uni- versities through the National Program FPU (grant reference FPU15/04161
    corecore